---
title: Self-hosting Moondream
description: "Use a locally running or self-deployed Moondream server"
icon: orbit
---

If you'd rather run Moondream yourself than use their cloud offering, there are many ways to do so depending on your needs:

1. 💻 Run **locally** on the same machine where you're running Magnitude tests
    -  Great for local development on high-end machines
2. 🔒 **Self-host** on your own infrastucture or VPC
    -  When compliance and security are essential
3. ⚡ **Self-host** with GPUs on a cloud platform like [Modal](https://modal.com)
    -  Perfect for quickly booting up a fast Moondream server

Select the appropriate option below for more instructions!

<br/>

<Tabs>
  <Tab title="Running Locally">
    ## Running Moondream locally

    To run Moondream on your local machine, first go to https://moondream.ai/c/moondream-server and download the appropriate server (currently Moondream only provides macOS/Linux executables).

    Follow the instructions on that page to run the executable.

    Then you can configure `magnitude.config.ts` with the local base URL:
    ```ts
    import { type MagnitudeConfig } from 'magnitude-test';

    export default {
        url: "http://localhost:5173",
        executor: {
            provider: 'moondream',
            options: {
                baseUrl: 'http://localhost:2020/v1'
            }
        }
    } satisfies MagnitudeConfig;
    ```

    That's it! Now you can run tests with local Moondream.

    <Warning>Local inference is very slow on CPU, newer macOS machines or a GPU is recommended for this, otherwise you may want to [deploy on modal](#self-hosting-moondream-on-modal) instead</Warning>

  </Tab>

  <Tab title="Hosting on VPC">
    ## Hosting in a VPC
    If your company has strict requirements around where infrastructure needs to be hosted, you may need to host Moondream inside your company's VPC on your cloud provider of choice.

    The specific setup required will depend on the cloud provider.

    However generally speaking these are the requirements:
    1. A compute instance in your VPC with a GPU
    2. A linux distribution with CUDA configured on the compute instance
    3. Run the [Moondream server](https://moondream.ai/c/moondream-server) executable on this instance 
    4. Ensure that the instance is accessible via port 2020 to your CI runners or dev machines, and configure test runners to connect to the appropriate URL:

    ```ts
    import { type MagnitudeConfig } from 'magnitude-test';

    export default {
        url: "http://localhost:5173",
        executor: {
            provider: 'moondream',
            options: {
                baseUrl: 'https://some-host-in-my-vpc/v1'
            }
        }
    } satisfies MagnitudeConfig;
    ```

    If you are an enterprise trying to configure Magnitude in your company, feel free to reach out to us directly at founders@magnitude.run.
  </Tab>

  <Tab title="Hosting on Modal">
    ## Self-hosting Moondream on Modal

    Probably the easiest way to deploy Moondream is by hosting it on Modal, so we provide a [deployment script](https://github.com/magnitudedev/magnitude/blob/main/infra/moondream.py) to do this.

    ### Set up Modal
    1. Create an account at [modal.com](https://modal.com)
    2. Run `pip install modal` to install the modal Python package
    3. Run `modal setup` to authenticate (if this doesn’t work, try `python -m modal setup`)

    > More info: https://modal.com/docs/guide

    <Tip> Modal gives $30 in free credits per month.</Tip>

    ### Deploy Moondream

    Clone Magnitude and run the `moondream.py` deploy script to automatically download the Moondream server and model weights to a Modal volume and deploy the endpoint:
    ```sh
    git clone https://github.com/magnitudedev/magnitude.git
    cd magnitude/infra
    modal deploy moondream.py
    ```

    Your Moondream endpoint will then be available at `https://<your-modal-username>--moondream.modal.run/v1`.

    Then you can configure `magnitude.config.ts` with this base URL and start running tests powered by your newly deployed Moondream server!

    ```ts
    import { type MagnitudeConfig } from 'magnitude-test';

    export default {
        url: "http://localhost:5173",
        executor: {
            provider: 'moondream',
            options: {
                baseUrl: 'https://<your-modal-username>--moondream.modal.run/v1'
            }
        }
    } satisfies MagnitudeConfig;
    ```

    Keep in mind there may be some cold start times since Modal automatically scales down containers when not in use.

    <Warning> This Modal server will be publically accessible. We are working on a better way to make it configurable with a custom API key. </Warning>

    ### Customizing Deployment

    You may want to customize the modal deployment depending on your needs.

    You can modify the `moondream.py` deployment script to fit your needs.

    Common options you may want to change:
    - `gpu`: GPU configuration. See Modal's [pricing page](https://modal.com/pricing) for details on different available GPUs and their cost. Also see [comparison](#gpu-comparison) below.
    - `scaledown_window`: Time in seconds a container will wait to shut down after receiving no requests. If higher, will let you run tests after longer without needing the container to cold-start again.
    - `min_containers`: By setting this option you force Modal to keep some number of containers open to handle requests. This means Modal will also bill you for those containers all the time, but eliminates cold-starts.

    ### GPU Comparison

    Here's a breakdown of how quickly different GPUs available on Modal are able to handle typical requests that Magnitude makes to Moondream:

    | GPU         | Approximate inference time per action | Modal cost per hour |
    | ----------- | ------------------------------------------------ | ------------------- |
    | H100        | ~200ms                                           | $3.95               |
    | A100 (40GB) | ~300ms                                           | $2.10               |
    | A10G        | ~500ms                                           | $1.10               |
    | T4          | ~800ms                                           | $0.59               |

    Since Magnitude needs to wait a bit for the page to stabilize anyway, probably something like the A10G is a good price/performance balance, but any of these work well!

  </Tab>
</Tabs>

